AITopics | content word

Collaborating Authors

content word

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

A Knowledge-Based Language Model: Deducing Grammatical Knowledge in a Multi-Agent Language Acquisition Simulation

Shakouri, David Ph., Cremers, Crit, Schiller, Niels O.

arXiv.org Artificial IntelligenceDec-9-2025

This paper presents an initial study performed by the MODOMA system. The MODOMA is a computational multi-agent laboratory environment for unsupervised language acquisition experiments such that acquisition is based on the interaction between two language models, an adult and a child agent. Although this framework employs statistical as well as rule-based procedures, the result of language acquisition is a knowledge-based language model, which can be used to generate and parse new utterances of the target language. This system is fully parametrized and researchers can control all aspects of the experiments while the results of language acquisition, that is, the acquired grammatical knowledge, are explicitly represented and can be consulted. Thus, this system introduces novel possibilities for conducting computational language acquisition experiments. The experiments presented by this paper demonstrate that functional and content categories can be acquired and represented by the daughter agent based on training and test data containing different amounts of exemplars generated by the adult agent. Interestingly, similar patterns, which are well-established for human-generated data, are also found for these machine-generated data. As the procedures resulted in the successful acquisition of discrete grammatical categories by the child agent, these experiments substantiate the validity of the MODOMA approach to modelling language acquisition.

acquisition, artificial intelligence, natural language, (17 more...)

arXiv.org Artificial Intelligence

2512.02195

Country:

Europe > Netherlands (0.47)
North America > United States (0.46)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)

Add feedback

A Stylometric Application of Large Language Models

Stropkay, Harrison F., Chen, Jiayi, Latifi, Mohammad J., Rockmore, Daniel N., Manning, Jeremy R.

arXiv.org Artificial IntelligenceOct-28-2025

We show that large language models (LLMs) can be used to distinguish the writings of different authors. Specifically, an individual GPT-2 model, trained from scratch on the works of one author, will predict held-out text from that author more accurately than held-out text from other authors. We suggest that, in this way, a model trained on one author's works embodies the unique writing style of that author. We first demonstrate our approach on books written by eight different (known) authors. We also use this approach to confirm R. P. Thompson's authorship of the well-studied 15th book of the Oz series, originally attributed to F. L. Baum.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2510.21958

Country: North America > United States (1.00)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.70)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

Add feedback

What Do Humans Hear When Interacting? Experiments on Selective Listening for Evaluating ASR of Spoken Dialogue Systems

Mori, Kiyotada, Kawano, Seiya, Liu, Chaoran, Ishi, Carlos Toshinori, Contreras, Angel Fernando Garcia, Yoshino, Koichiro

arXiv.org Artificial IntelligenceOct-9-2025

Spoken dialogue systems (SDSs) utilize automatic speech recognition (ASR) at the front end of their pipeline. The role of ASR in SDSs is to recognize information in user speech related to response generation appropriately. Examining selective listening of humans, which refers to the ability to focus on and listen to important parts of a conversation during the speech, will enable us to identify the ASR capabilities required for SDSs and evaluate them. In this study, we experimentally confirmed selective listening when humans generate dialogue responses by comparing human transcriptions for generating dialogue responses and reference transcriptions. Based on our experimental results, we discuss the possibility of a new ASR evaluation method that leverages human selective listening, which can identify the gap between transcription ability between ASR systems and humans.

artificial intelligence, dialogue response, natural language, (18 more...)

arXiv.org Artificial Intelligence

2508.04402

Country: Asia > Japan (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Therapeutic Area (0.68)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)

Add feedback

Geometric Structures and Patterns of Meaning: A PHATE Manifold Analysis of Chinese Character Embeddings

Gong, Wen G.

arXiv.org Artificial IntelligenceOct-3-2025

We systematically investigate geometric patterns in Chinese character embeddings using PHATE manifold analysis. Through cross-validation across seven embedding models and eight dimensionality reduction methods, we observe clustering patterns for content words ( 实词) and branching patterns for function words ( 虚词). Analysis of 1000+ characters across 12 semantic domains reveals that geometric complexity correlates with semantic content: meaningful characters exhibit rich geometric diversity while structural radicals collapse into tight clusters. The comprehensive 子-network analysis (123 phrases) demonstrates systematic semantic expansion from fundamental element character. These findings provide computational evidence supporting traditional linguistic theory and establish a novel framework for geometric analysis of semantic organization.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2510.0123

Genre: Research Report > Experimental Study (0.46)

Industry:

Health & Medicine (1.00)
Education (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.95)

Add feedback

Prior-based Noisy Text Data Filtering: Fast and Strong Alternative For Perplexity

Seo, Yeongbin, Kim, Gayoung, Kim, Jaehyung, Yeo, Jinyoung

arXiv.org Artificial IntelligenceSep-30-2025

As large language models (LLMs) are pretrained on massive web corpora, careful selection of data becomes essential to ensure effective and efficient learning. While perplexity (PPL)-based filtering has shown strong performance, it suffers from drawbacks: substantial time costs and inherent unreliability of the model when handling noisy or out-of-distribution samples. In this work, we propose a simple yet powerful alternative: a prior-based data filtering method that estimates token priors using corpus-level term frequency statistics, inspired by linguistic insights on word roles and lexical density. Our approach filters documents based on the mean and standard deviation of token priors, serving as a fast proxy to PPL while requiring no model inference. Despite its simplicity, the prior-based filter achieves the highest average performance across 20 downstream benchmarks, while reducing time cost by over 1000x compared to PPL-based filtering. We further demonstrate its applicability to symbolic languages such as code and math, and its dynamic adaptability to multilingual corpora without supervision

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2509.18577

Genre: Research Report > New Finding (0.67)

Industry:

Education (0.47)
Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

ChatGPT-generated texts show authorship traits that identify them as non-human

Dentella, Vittoria, Huang, Weihang, Mansi, Silvia Angela, Grieve, Jack, Leivada, Evelina

arXiv.org Artificial IntelligenceAug-25-2025

Large Language Models can emulate different writing styles, ranging from composing poetry that appears indistinguishable from that of famous poets to using slan g that can convince people that they are chatting with a human online . While differences in style may not always be visible to the untrained eye, we can generally distinguish the writing of different people, like a linguistic fingerprint. This work examines whether a language model can also be linked to a specific fingerprint . Through stylometric and multidimensional register analys e s, w e compare human - authored and model - authored texts from different registers. We find that the model can successfully adapt its style depending on whether it is prompted to produce a Wikipedia entry vs. a college essay, but not in a way that makes it indistinguishable from human s . Concretely, the model shows more limited variation when producing outputs in different registers. O ur results suggest that the model prefers nouns to verbs, thus showing a distinct linguistic backbone from humans, who tend to anchor language in the highly grammaticalized dimensions of tense, aspect, and mood . It is possible that the more complex domains of grammar reflect a mode of thought unique to humans, thus acting as a litmus test for Artificial Intelligence. 2 Introduction Scholars from different disciplines have been addressing the question of what makes us human for centuries. For Nobel laureate Bertrand Russell, the answer is language, for "no matter how eloquently a dog may bark, he cannot tell you that his parents were poor but honest". H uman language is both flexible and constrained at the same time, and this is why the Turing Test, described as a litmus test for Artificial Intelligence [ Shieber 199 4, French 200 0], is linked to achieving a level of conversational proficiency that is highly complex, akin to that of a human [ Turing 1950 ] . Human language is flexible in the sense that we all make different choices when conversing. Every human is thought t o have a distinct linguistic fingerprint called idiolect [ Halliday et al. 196 4, Coulthard 2004 ] . This idiolect, which can be defined as an individual's unique use of linguistic forms (including lexical choices, collocations and fixed expressions, punctuation patterns, misspellings, and grammatical style), is critical for authorship attribution in a range of situations: from identifying that a poem with dashes, elliptical syntax, and unconventional capitalization is more likely authored by Emily Dickinson and not by William Shakespeare, to pinning down a person of interest in the course of a criminal investigation, as happened in the Unabomber case .

large language model, machine learning, variation, (21 more...)

arXiv.org Artificial Intelligence

2508.16385

Country:

North America > United States (0.46)
Europe > United Kingdom > England (0.14)

Genre: Research Report > New Finding (0.87)

Industry:

Law (0.48)
Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.91)

Add feedback

A Computational Approach to Analyzing Language Change and Variation in the Constructed Language Toki Pona

Huang, Daniel, Joo, Hyoun-A

arXiv.org Artificial IntelligenceAug-15-2025

This study explores language change and variation in Toki Pona, a constructed language with approximately 120 core words. Taking a computational and corpus-based approach, the study examines features including fluid word classes and transitivity in order to examine (1) changes in preferences of content words for different syntactic positions over time and (2) variation in usage across different corpora. The results suggest that sociolinguistic factors influence Toki Pona in the same way as natural languages, and that even constructed linguistic systems naturally evolve as communities use them.

artificial intelligence, natural language, oki pona, (11 more...)

arXiv.org Artificial Intelligence

2508.10246

Country: Europe (0.14)

Genre:

Research Report > New Finding (0.49)
Research Report > Experimental Study (0.35)

Technology: Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)

Add feedback

Dynamik: Syntactically-Driven Dynamic Font Sizing for Emphasis of Key Information

Nishida, Naoto, Ishiguro, Yoshio, Rekiomto, Jun, Yamashita, Naomi

arXiv.org Artificial IntelligenceApr-15-2025

In today's globalized world, there are increasing opportunities for individuals to communicate using a common non-native language (lingua franca). Non-native speakers often have opportunities to listen to foreign languages, but may not comprehend them as fully as native speakers do. To aid real-time comprehension, live transcription of subtitles is frequently used in everyday life (e.g., during Zoom conversations, watching YouTube videos, or on social networking sites). However, simultaneously reading subtitles while listening can increase cognitive load. In this study, we propose Dynamik, a system that reduces cognitive load during reading by decreasing the size of less important words and enlarging important ones, thereby enhancing sentence contrast. Our results indicate that Dynamik can reduce certain aspects of cognitive load, specifically, participants' perceived performance and effort among individuals with low proficiency in English, as well as enhance the users' sense of comprehension, especially among people with low English ability. We further discuss our methods' applicability to other languages and potential improvements and further research directions.

artificial intelligence, machine learning, social media, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3708359.3712115

2504.09734

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Europe > Italy > Sardinia > Cagliari (0.06)
North America > United States > New York > New York County > New York City (0.06)
(39 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Media (1.00)
Education (1.00)
Government > Regional Government > North America Government > United States Government (0.68)
(3 more...)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Implicit In-Context Learning: Evidence from Artificial Language Experiments

Ma, Xiaomeng, Xu, Qihui

arXiv.org Artificial IntelligenceMar-31-2025

Humans acquire language through implicit learning, absorbing complex patterns without explicit awareness. While LLMs demonstrate impressive linguistic capabilities, it remains unclear whether they exhibit human-like pattern recognition during in-context learning at inferencing level. We adapted three classic artificial language learning experiments spanning morphology, morphosyntax, and syntax to systematically evaluate implicit learning at inferencing level in two state-of-the-art OpenAI models: gpt-4o and o3-mini. Our results reveal linguistic domain-specific alignment between models and human behaviors, o3-mini aligns better in morphology while both models align in syntax.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2503.2419

Country:

North America > United States > Virginia (0.04)
North America > United States > Ohio (0.04)
Europe > Monaco (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Education > Curriculum > Subject-Specific Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Filters

Collaborating Authors

content word

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

460191c72f67e90150a093b4585e7eb4-Supplemental.pdf

A Knowledge-Based Language Model: Deducing Grammatical Knowledge in a Multi-Agent Language Acquisition Simulation

A Stylometric Application of Large Language Models

What Do Humans Hear When Interacting? Experiments on Selective Listening for Evaluating ASR of Spoken Dialogue Systems

Geometric Structures and Patterns of Meaning: A PHATE Manifold Analysis of Chinese Character Embeddings

Prior-based Noisy Text Data Filtering: Fast and Strong Alternative For Perplexity

ChatGPT-generated texts show authorship traits that identify them as non-human

A Computational Approach to Analyzing Language Change and Variation in the Constructed Language Toki Pona

Dynamik: Syntactically-Driven Dynamic Font Sizing for Emphasis of Key Information

Implicit In-Context Learning: Evidence from Artificial Language Experiments